Generating Robust Features for Style-independent Labeling of Bibliographic Fields in Medical Journal Articles

نویسندگان

  • Song Mao
  • Jongwoo Kim
  • Daniel X. Le
  • George R. Thoma
چکیده

Bibliographical data such as title, author, affiliation, and abstract are crucial for indexing biomedical journal articles. The Medical Article Records System (MARS) has been developed at the National Library of Medicine (NLM) to automate bibliographical data extraction for MEDLINE®, the NLM’s premier database of citations to the biomedical literature. The automatic extraction of bibliographic data involves the process of assigning logical labels (title, author, affiliation, and abstract) to homogeneous regions or zones on page images. While an OCRand rule-based labeling module (called ZoneCzar) in MARS can reliably label medical journals with regular layout styles, it cannot accurately label the journals with arbitrary or unusual layout styles, and new rules have to be manually created for these journals. Furthermore, the OCR zoning errors, particularly merging errors, can greatly affect the labeling accuracy of ZoneCzar. In this paper, we describe an algorithm for automatic generation of robust features that are used by the labeling algorithm to perform style-independent labeling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Style-independent document labeling: design and performance evaluation

The Medical Article Records System or MARS has been developed at the U.S. National Library of Medicine (NLM) for automated data entry of bibliographical information from medical journals into MEDLINE®, the premier bibliographic citation database at NLM. Currently, a rule-based algorithm (called ZoneCzar) is used for labeling important bibliographical fields (title, author, affiliation, and abst...

متن کامل

Radiology, nuclear medicine, and medical imaging: a bibliometric study in Iran

Introduction: Nowadays, science mapping is considered an excellent technique for decision-makers to find solutions for problems in research planning and development. In this work, we aimed to depict a science map of “radiology, nuclear medicine, and medical imaging” in Iran. Methods: All publications indexed in Thomson Reuters Web of Science database in the fields mentioned above with at least...

متن کامل

Automated Labeling from Biomedical Journals published in Foreign Languages

An automated labeling (AL) module is developed to produce bibliographic records such as English title, vernacular title, author, affiliation, and English abstract from biomedical articles published in foreign language journals. Optical character recognition (OCR) output from scanned biomedical journals is used in this labeling process. Since frequently occurring words in a zone are important fe...

متن کامل

Automated Labeling Algorithms for Biomedical Document Images

The National Library of Medicine (NLM) has developed an automated system, named Medical Article Records System (MARS), to process bibliographic data (title, authors, affiliation, abstract, etc.) in biomedical journal articles for its MEDLINE database. This paper describes a labeling module in the MARS, which automatically extract the bibliographic data in biomedical journal articles. The label...

متن کامل

تحلیل استنادی مقالات مجله دانشگاه علوم پزشکی قم بین سالهای 1391-1386

Background and Objectives:Regarding the important role and position of journals in presentation of the most up-to-date scholarly information, the scientific and meticulous evaluation of such sources is very essential. Te present research was carried out to determine the citation status in the articles of the Journal of Qom University of Medical Sciences. Methods:This research, as a descriptive...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003